AITopics | low-level policy

ReMA: Learning to Meta-think for LLMs with Multi-agent Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 07:17:08 GMT

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking--enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit metathinking behaviors, encouraging LLMs to think about thinking.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

6d7c4a0727e089ed6cdd3151cbe8d8ba-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 12:56:16 GMT

machine learning, natural language, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.68)
Instructional Material (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning

Neural Information Processing SystemsFeb-17-2026, 00:16:15 GMT

HRL lead to improved sample complexity, and how much of an improvement can it provide? Theoretical work on sample-complexity bound in Machine Learning has been integral to the development of the field.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

6d7c4a0727e089ed6cdd3151cbe8d8ba-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 20:02:31 GMT

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

HIQL: Offline Goal-Conditioned RL with Latent States as Actions Seohong Park

Neural Information Processing SystemsFeb-13-2026, 20:02:28 GMT

We observe that part of this difficulty stems from the "signal-to-noise"

machine learning, natural language, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report > New Finding (0.68)
Instructional Material (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

e5dd4fbb6fb4cb805b982bfb41c20aad-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 12:47:01 GMT

dataset, humanoid, snippet, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ee39e503b6bedf0c98c388b7e8589aca-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 19:27:59 GMT

high-level policy, international conference, landmark, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, Chelsea Finn

Neural Information Processing SystemsFeb-11-2026, 10:28:16 GMT

With the ability to learn concepts and sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, wecanacquire temporally-extended behaviors.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback